Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning applications and research. Previous generative approaches for multimodal input approximate a joint-modality posterior by uni-modality posteriors as product-of-experts (PoE) or mixture-of-experts (MoE). We argue that these approximations lead to a defective bound for the optimization process and loss of semantic connection among modalities. This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space while handling the missing modality problem. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization. In public datasets of various domains, the experimental results demonstrate that the proposed method is applicable to order-agnostic cross-modal generation while achieving outstanding performance compared to the state-of-the-art multimodal methods. The source code for our method is available online https://anonymous.4open.science/r/SMVAE-9B3C/.
translated by 谷歌翻译
Over the years, Machine Learning models have been successfully employed on neuroimaging data for accurately predicting brain age. Deviations from the healthy brain aging pattern are associated to the accelerated brain aging and brain abnormalities. Hence, efficient and accurate diagnosis techniques are required for eliciting accurate brain age estimations. Several contributions have been reported in the past for this purpose, resorting to different data-driven modeling methods. Recently, deep neural networks (also referred to as deep learning) have become prevalent in manifold neuroimaging studies, including brain age estimation. In this review, we offer a comprehensive analysis of the literature related to the adoption of deep learning for brain age estimation with neuroimaging data. We detail and analyze different deep learning architectures used for this application, pausing at research works published to date quantitatively exploring their application. We also examine different brain age estimation frameworks, comparatively exposing their advantages and weaknesses. Finally, the review concludes with an outlook towards future directions that should be followed by prospective studies. The ultimate goal of this paper is to establish a common and informed reference for newcomers and experienced researchers willing to approach brain age estimation by using deep learning models
translated by 谷歌翻译
最近,分布式的半监督学习(DSSL)算法表明,它们在利用未标记的样本优于互连网络方面的有效性,在这些网络上,代理无法彼此共享其原始数据,并且只能与邻居传达非敏感信息。但是,现有的DSSL算法无法应对数据不确定性,并且可能会遭受高度计算和通信开销问题的困扰。为了解决这些问题,我们提出了一个分布式的半监督模糊回归(DSFR)模型,该模型具有模糊的规则和插值一致性正则化(ICR)。 ICR最近是针对半监督问题的,可以迫使决策边界通过稀疏的数据区域,从而增加模型的鲁棒性。但是,尚未考虑其在分布式方案中的应用。在这项工作中,我们提出了分布式模糊C均值(DFCM)方法和分布式插值一致性正则化(DICR)(DICR)构建在众所周知的乘数交替方向方法上,以分别定位DSFR的先行和结果组件中的参数。值得注意的是,DSFR模型的收敛非常快,因为它不涉及后传播过程,并且可扩展到从DFCM和DICR的利用率中受益的大规模数据集。人工和现实世界数据集的实验结果表明,就损失价值和计算成本而言,提出的DSFR模型可以比最新的DSSL算法获得更好的性能。
translated by 谷歌翻译
异质的大数据在机器学习中构成了许多挑战。它的巨大规模,高维度和固有的不确定性使机器学习的几乎每个方面都变得困难,从提供足够的处理能力到保持模型准确性到保护隐私。但是,也许最引人注目的问题是,大数据通常散布在敏感的个人数据中。因此,我们提出了一个保护隐私的层次模糊神经网络(PP-HFNN),以应对这些技术挑战,同时也减轻了隐私问题。通过两阶段优化算法对网络进行训练,并且基于众所周知的交替方向方法,通过方案学习了层次级别低级别的参数,该方案不会向其他代理揭示本地数据。高级层次结构的协调通过交替优化方法来处理,该方法的收敛很快。整个训练过程是可扩展的,快速的,并且不会遭受基于后传播的方法等梯度消失的问题。对回归和分类任务进行的综合模拟证明了所提出的模型的有效性。
translated by 谷歌翻译
多目标自组织追求(SOP)问题已广泛应用,并被认为是一个充满挑战的分布式系统的自组织游戏,在该系统中,智能代理在其中合作追求具有部分观察的多个动态目标。这项工作为分散的多机构系统提出了一个框架,以提高智能代理的搜索和追求能力。我们将一个自组织的系统建模为可观察到的马尔可夫游戏(POMG),具有权力下放,部分观察和非通信的特征。然后将拟议的分布式算法:模糊自组织合作协同进化(FSC2)杠杆化,以解决多目标SOP中的三个挑战:分布式自组织搜索(SOS),分布式任务分配和分布式单目标追踪。 FSC2包括一种协调的多代理深钢筋学习方法,该方法使均匀的代理能够学习天然SOS模式。此外,我们提出了一种基于模糊的分布式任务分配方法,该方法将多目标SOP分解为几个单目标追求问题。合作进化原则用于协调每个单一目标问题的分布式追随者。因此,可以缓解POMG中固有的部分观察和分布式决策的不确定性。实验结果表明,在所有三个子任务中,分布式不传动的多机构协调都具有部分观察结果,而2048 FSC2代理可以执行有效的多目标SOP,其捕获率几乎为100%。
translated by 谷歌翻译
在本文中,正在研究精神任务 - 根脑 - 计算机接口(BCI)的分类,因为这些系统是BCI中的主要调查领域,因为这些系统可以增强具有严重残疾人的人们的生命。 BCI模型的性能主要取决于通过多个通道获得的特征向量的大小。在心理任务分类的情况下,培训样本的可用性最小。通常,特征选择用于通过摆脱无关紧要和多余的功能来增加心理任务分类的比率。本文提出了一种为精神任务分类选择相关和非冗余频谱特征的方法。这可以通过使用四个非常已知的多变量特征选择方法VIZ,BHATTACHARYA的距离,散射矩阵的比率,线性回归和最小冗余和最大相关性。这项工作还涉及对心理任务分类的多元和单变量特征选择的比较分析。在应用上述方法后,研究结果表明了精神任务分类的学习模型的性能的大量改进。此外,通过执行稳健的排名算法和弗里德曼的统计测试来认识所提出的方法的功效,以找到最佳组合并比较功率谱密度和特征选择方法的不同组合。
translated by 谷歌翻译
本文提出了一种偏好神经网络(PNN),以解决新的激活函数的缺陷偏好命令问题。PNN还解决了多标签排名问题,其中标签可能具有漠不关心的偏好顺序,或者子组等于等级。PNN遵循具有完全连接的神经元的多层前馈架构。每个神经元都包含基于偏好订单数量的新型平滑楼梯激活功能。PNN输入表示数据特征,输出神经元代表标签索引。使用新的偏好挖掘数据集进行评估所提出的PNN,该数据集包含在之前没有尝试的重复标签值。PNN优于先前提出的关于严格标签排名的五种方法,以高计算效率的准确结果。
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译